Larry Wasserman Low Assumptions , High Dimensions

نویسنده

  • Larry Wasserman
چکیده

These days, statisticians often deal with complex, high dimensional datasets. Researchers in statistics and machine learning have responded by creating many new methods for analyzing high dimensional data. However, many of these new methods depend on strong assumptions. The challenge of bringing low assumption inference to high dimensional settings requires new ways to think about the foundations of statistics. Traditional foundational concerns, such as the Bayesian versus frequentist debate, have become less important. 1. In the Olden Days There is a joke about media bias from the comedian Al Franken: “To make the argument that the media has a leftor right-wing, or a liberal or a conservative bias, is like asking if the problem with Al-Qaeda is: do they use too much oil in their hummus?” I think a similar comment could be applied to the usual debates in the foundations of statistical inference. The important foundation questions are not ‘Bayes versus Frequentist’ or ‘Objective Bayesian versus Subjective Bayesian’. To me, the most pressing foundational question is: how do we reconcile the two most powerful needs in modern statistics: the need to make methods assumption free and the need to make methods work in high dimensions. Methods that hinge on weak assumptions are always valuable. But this is especially so in high dimensional problems. The Bayes-Frequentist debate is not irrelevant but it is not as central as it once was. I’ll discuss Bayesian inference in section 4. Our search for low assumption, high dimension methods is complicated by the fact that our intuition in high dimensions is often misguided. In the olden days, statistical models had low dimension d and large sample size n. These models guided our intuition but this intuition is inadequate for modern data where d > n. An analogy from physics is helpful. Physics was initially guided by simple thought (and real) experiments about falling apples, balls rolling down inclined planes and moving objects bumping into each other. This approach guided physics successfully for a while. But modern physics (quantum mechanics, fields

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Risk Bounds For Mode Clustering

Density mode clustering is a nonparametric clustering method. The clusters are the basins of attraction of the modes of a density estimator. We study the risk of mode-based clustering. We show that the clustering risk over the cluster cores — the regions where the density is high — is very small even in high dimensions. And under a low noise condition, the overall cluster risk is small even bey...

متن کامل

Low-Noise Density Clustering

We study density-based clustering under low-noise conditions. Our framework allows for sharply defined clusters such as clusters on lower dimensional manifolds. We show that accurate clustering is possible even in high dimensions. We propose two data-based methods for choosing the bandwidth and we study the stability properties of density clusters. We show that a simple graph-based algorithm kn...

متن کامل

Discussion : “ a Significance Test for the Lasso ”

The paper by Lockhart, Taylor, Tibshirani and Tibshirani (LTTT) is an important advancement in our understanding of inference for high-dimensional regression. The paper is a tour de force, bringing together an impressive array of results, culminating in a set of very satisfying convergence results. The fact that the test statistic automatically balances the effect of shrinkage and the effect of...

متن کامل

Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures

We consider the problem of clustering data points in high dimensions, i.e., when the number of data points may be much smaller than the number of dimensions. Specifically, we consider a Gaussian mixture model (GMM) with two non-spherical Gaussian components, where the clusters are distinguished by only a few relevant dimensions. The method we propose is a combination of a recent approach for le...

متن کامل

Riemannian Geometry and Statistical Machine Learning

Statistical machine learning algorithms deal with the problem of selecting an appropriate statistical model from a model space Θ based on a training set {xi}i=1 ⊂ X or {(xi, yi)}i=1 ⊂ X × Y. In doing so they either implicitly or explicitly make assumptions on the geometries of the model space Θ and the data space X . Such assumptions are crucial to the success of the algorithms as different geo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011